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EFFICIENCY IMPROVEMENTS IN INFERENCE ON STATIONARY 
AND NONSTATIONARY FRACTIONAL TIME SERIES^ 

By p. M. Robinson 
London School of Economics 

We consider a time series model involving a fractional stochastic 
component, whose integration order can lie in the stationary/invertible 
or nonstationary regions and be unknown, and an additive determin- 
istic component consisting of a generalized polynomial. The model 
can thus incorporate competing descriptions of trending behavior. 
The stationary input to the stochastic component has parametric 
autocorrelation, but innovation with distribution of unknown form. 
The model is thus semiparametric, and we develop estimates of the 
parametric component which are asymptotically normal and achieve 
an Af-estimation efficiency bound, equal to that found in work using 
an adaptive LAM/LAN approach. A major technical feature which 
we treat is the effect of truncating the autoregressive representation 
in order to form innovation proxies. This is relevant also when the 
innovation density is parameterized, and we provide a result for that 
case also. Our semiparametric estimates employ nonparametric series 
estimation, which avoids some complications and conditions in ker- 
nel approaches featured in much work on adaptive estimation of time 
series models; our work thus also contributes to methods and theory 
for nonfractional time series models, such as autoregressive moving 
averages. A Monte Carlo study of finite sample performance of the 
semiparametric estimates is included. 

1. Introduction. This paper obtains efficient parameter estimates in sta- 
tionary or nonstationary, possibly fractional, time series. Consider a regres- 
sion model given by 

(1.1) yt = n^zt+xt, t£Z, 
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where Z = {t : t = 0, ±1, . . . }, zt is a deterministic qxl vector sequence, 
H is an unknown qxl vector, T denotes transposition, xt is a zero-mean 
stochastic process and yt is an observable sequence. Any nonstationarity in 
the mean of yt would be due to zt, nonstationarity in variance to xt, but 
cases when fi^zt is a priori constant and xt is stationary are also of interest. 

To describe xt, denote by B the back-shift operator, so Bxt = xt~i, and 
denote by A = 1 — S the difference operator; formally, for all real d 

with r denoting the gamma function such that T{d) = oo for d = 0, — 1, —2, 

and r(0)/r(0) = 1. Assume the sequence Xt is given by 

(1.2) xt = A-"'Hf, t£Z, 
where niQ is a nonnegative integer, 

(1.3) vf = vtlit>l), tez, 
for !(•) the indicator function, and 

(1.4) vt = A-^''ut, tez, 

for 1^0 1 < ^) with Ut a zero-mean covariance stationary process with abso- 
lutely continuous spectral distribution function and spectral density /(A) 
that is at least positive and finite for all A. 

The process Vt is then also covariance stationary, having "long memory" 
for (^0 > 0, "short memory" for Co = and "negative memory" for Co < 0. 
When mo = 0, we have xt = vY = vt for t>l. When mo > 1, "integrates" 
vY 1 and the truncation in (1.2) implies that xt has variance that is finite, 
albeit evolving with t. Putting ^o = f^o + Co; xt is well defined for 

(1.5) coGSc{e:-i<e<oo,e/ii,...}. 

The requirement ^q> —i^ excludes noninvertible processes, and the final 
qualification in (1.5) excludes ^o that cannot be reduced to the station- 
ary/invertible region {—\,\) by integer differencing. Alternative definitions 

of nonstationary fractional xt are available, for example, A~^°uf. 

Suppose ^0 is unknown; mo may also be unknown. Suppose ut is assumed 
to have parametric autocorrelation, 

(1.6) /(A) = ^|/?(e^^z.o)P, AG(-^,7r], 

such that cov(no,Mj) = J^^ f{X) cos{jX) dX, j € Z, /3(s;i/) is a smooth given 
function of complex- valued s and the column- vector u C W'^~^, pi>l, 
satisfying 

(1.7) Po{iy) = h (3{s]iy)^0, \s\<l,i^£V, 
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where = Jl^P{e^'^; ly) cos(jA) dX, and £V and > are unknown. 

Then ag is the variance of the one-step-ahead prediction error of the best 
hnear predictor for Uf. For example, ut can be a standardly parameterized 
autoregressive moving average (ARMA) process of autoregressive (AR) or- 
der pii and moving average (MA) order pi2, such that pi — 1 <pu + Pi2 < 
oo; when i^q consists precisely of the AR and MA coefficients we have 
Pii +Pi2 = Pi — 1; otherwise the coefficients obey prior restrictions. We call 
Vt a FARIMA(pii, Co)Pi2)) and Xt a FARIMA(pii, ^0;Pi2)- Whereas vt is sta- 
tionary, due to the truncation (1.2) xt is nonstationary even when Co < 5 (it 
could be called "asymptotically stationary" then). The case when xt = vt for 
all t G Z, so Xt is stationary, can be dealt with similarly, but we impose the 
truncation in (1.2) for all itiq > for the sake of a unified presentation. The 
set V is contained in the "stationary and invertible region." The case pi = l 
means uq is empty, and if /3 = 1, is a FARIMA(0, 0). An alternative 
model for ut is due to Bloomfield [4]. 

The main focus of the paper is estimation of ^oi = (Coi'^o")"^! and we 
restrict to a specialized form of zt in (1.1): 

(1.8) Zt = {t^\...,e''flit>l), n<T2<---<Tg, 

where the tj are real valued. Debate has centered on the origin — deterministic 
or stochastic — of nonstationarity in time series. A notable feature is compe- 
tition at low frequencies, and given the fractional model for xt this is most 
neatly expressed by (1.8). Some components of zt may have negligible ef- 
fect on fractionally differenced yt- Denote by fij the jth. element of ;U and 
71 = {j : Tj < Co - l), % = {j ■■ Tj = Co}, T3 = {i : Co - I < Tj < Co; rj > ^o}, 
where any of these sets can be empty. We cannot estimate fij for j G 71, 
and do not discuss estimation of fij for j £T2. Write st = Z^jgTi and 
for p2 = #13 < q introduce the x 1 vectors Z2t and 6q2, whose jth ele- 
ments are the elements of zt and /i whose index is the jth largest element 
of 73. It will be convenient to write Z2t = {t^^, ■ ■ ■ ,t^''^)'^ , where the Xj are 
appropriate elements tj, and satisfy | < xi < " " " < Xp2- We can write (1.1) 
as 

(1.9) yt = st + ^i*t^'' + e]^2Z2t + xt, 

where /i* = if tj 7^ for all j . 

We discuss estimation of 6*02, along with ^oi- For this we require that 
the Tj, j G 73, are known. The boundary case of T^, tj = S^q — ^, thus strictly 
implies is known, but this provision is instead designed to cover a situation 
in which tj < .^q ~ | for all j G Ti is anticipated, with unknown, but in 
fact Tj = Co — ^ for some j. For di = (C, u^)'^ G 5 x F, introduce the function 

a(s;(9i): M x M^i ^R, and consider a{s;e[~^), where 6*5"^ = (0,1/'^)'^, such 
that 

(1.10) a{s;ei) = {l-s)^a{s;e'f^). 
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Take a{s; e[ ') = (3{s; u)'^ for |s| <l,iJeV, and note that /^^ a{e'^;9\ ') dX = 
1, V €V. From (1.6) and (1.7), Ut has one-sided AR representation 

(1.11) a{B;ei^-^)ut = aoet, teZ, 

where ^q^^ ^ = (0, i^q)'^ , and the ej are uncorrelated with zero mean and unit 
variance. Introduce square-summable coefficients aj{6i) in the expansion 

oo 

(1.12) a(s;6'i) =^Qj(0i)s^ \s\ <1,^ £ S,i^ £V, 

j=0 

so ao(^i) = 1. For given 6 = {OfjOj)'^, define the computable 

t-i 

j=0 

(1.13) 

1 

Eti9) = etie)--Y,et{e), t>l, 

the latter being proxies for aoSt, with st ignored in et{9) because it is an- 
ticipated to have negligible effect, and ignored in view of the mean- 
correction in Et{9). 

Given observations yt, t = 1, . . . ,n, define 

1 " 

(1.14) Qpi9,9s) = -Y,piEtie)/a;e3), 

for an n^^^-consistent estimate o" of o"o, a given nonnegative function p:R x 
RPs =^ ]^ and any admissible value ^3 of an unknown x 1 parameter 
vector ^03; G3 may be empty, as when p{s;93) = s^. Consider the estimate 
{9j,9^p) = argminexea Qp{^,^z), for compact sets 6 G W, 63 G W-K One 
anticipates (see, e.g., Martin's [24] discussion of M-estimates of ARMA mod- 
els) that under suitable conditions 9p,9^p are asymptotically independent 
and the asymptotic variance matrix of 9p depends on p only through the 
scalar factor Ti = J p'{s)'^g{s)ds/{J p"{s)g{s) ds}'^ , where the prime indi- 
cates differentiation, double-prime indicates twice differentiation and refer- 
ence to ^03 is suppressed. If integration by parts can be conducted, this and 
the Schwarz inequality indicate that Ti > J"^ ^ defining the information 

(1.15) J = J^{sfg{s)ds 
and the score function 



(1.16) 



V'(s) = -9'is)/g{s). 
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The lower bound is attained by ^logg, and the paper obtains estimates 
that are efficient in the sense of having the same asymptotic variance as 
^logp- In Theorem 2 of Section 3 we justify such an estimate on the basis of 
known g{s; 63). If g is misspecified, not only will the estimate not be efficient 
but it may even be inconsistent. Our main result is Theorem 1 of Section 3, 
which justifies efficient semiparametric estimates, in which the density of et is 
nonparametric. These estimates are adaptive in the sense of Stone [28] and 
are described in the following section. Section 4 describes a Monte Carlo 
study of finite sample behavior of the semiparametric estimates. Section 5 
attempts to place the work in perspective, relative to the literature. Section 6 
presents the main proof details, which use a series of lemmas that make up 
Section 7. Some of these, such as Lemmas 1, 2, 7, 8, 13, 15 and 16, may 
be useful in other work. A principal technical feature is our handling of the 
approximation of the aoEt in (1.11) by the et(0o) defined by (1.13), a delicate 
matter in fractional models. 

2. Semiparametric estimates. As in much adaptive estimation literature 
we take an approximate Newton step from an initial consistent estimate 9 of 
^0, with the same rate of convergence as ^logg- This requires estimating ip{s). 
We employ an approach developed by Beran [2] and Newey [25]. Beran [2] 
proposed a series estimate of ipis) [with respect to innovations in an AR(p) 
model] that employs integration by parts. His estimate of tp{s) was actually 
not a smoothed nonparametric one because he fixed the number of terms, 
L, in the series. Newey [25] allowed L to increase slowly with n, in adapting 
to error distribution of unknown form in cross-sectional regression. 

Let (j)i{s), ^ = 1, 2, . . . , be a sequence of given, continuously differentiable 
functions. For L > 1, scalar ht, t = 1, . . . ,n, and h = {hi, . . . , hn)'^ , define 
<P^^\ht) = {Mht),...,Mht)f,'^^''Hht) = <P^'^Hht)-n-'j:'I=i^^''\h,),^'^^\ht) = 
i<P[{ht),...,cl>'^{ht)f and 

n 

t=i 

n 

u;W(/i) =n-i^(/>'(^)(/it), 
t=l 

V^(^)(/ii;aW(/i))=dW(/i)^ci>(^)(M. 
With E{e) = {Ei{9),...,En{e)f define 

(0, a) = ^ W (i?i(0)/a; a(^) (i?(0)/a)) , 

where it will follow from our conditions that in a neighborhood of OQ,ao, 
W^^\E{e)/a) is nonsin gular with probability approaching 1 as n — > 00. 
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We then compute the 'ip\^\9,a). Following Beran [2] and Newey [25] we 
have approximated ?/'(et) by X]f=i ^^^{'/'^(st) ~ E(j)£{et)} [imposing the re- 
striction Etp{et) = 0], noted that (under conditions to be given) integration 
by parts implies E{(l)^^\et)'^{eQ)} = E{(j)'^^\et)} , estimated (ai, . . . , cl)'^ by 

a^^\E{e)/d), and then i;{et) by i;[^\e,a). 
Define [see (1.10)-(1.13)] 



where 

with 
(2.1) 



e',M = (^\B;ei){yt-elz2t), e't^iO) = -aiB;9i)z2t, 

a'{s; Oi) = {d/dei)a{s; Oi) = (1 - sfa{s- ^i'^)7(s; y), 
7(s; v) = [log(l - s), {{d/dvYa{s- 9^~^)]/a{s; O^"'^)]^ . 



Define 

n 

EU9)=e'u{e)-n-'Y.^'sM^ ^ = 1'2, 

n n 

rUe, a)=J2 #^ ^)E'u{0), R^{0) = E KmUOf, i = l,2, 
t=i t=i 

n 

t=i 

Estimate 6*01 , 9o2 by 

(2.2) ei = e, + {R,{e)jL{e,a)}-\Li{0,a), i = i,2, 

respectively, for 6 = {Of ,62 )'^. 

As in [25] we restrict to (f)e{s) satisfying 

(2.3) Ms) = (l){sY, 
for a smooth function (f){s). Examples are 

(2.4) Hs) = s, 

(2.5) <P{s) = s{l + s^)-^/''. 

Our conditions require L to increase very slowly with n, and allow the in- 
crease to be arbitrarily slow; in practice, for moderate n, (2.2) might be 
computed for a few small integers L, starting with L = 1. Recursive for- 
mulas are available, using partitioned regression, such that the elements of 
W^^\E{e)/a), w^^\E{e)/a) can be used in computing i;[^^^\e,a). 
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3. Main results. We introduce the following regularity conditions. Through- 
out the paper C denotes a finite but arbitrarily large constant. 

Assumption Al. The sequence yt is generated by (1.1) with xt gener- 
ated by (1.2)-(1.4) and (1-11), where the et are independent and identically 
distributed (i.i.d.) with zero mean and variance 1, and zt is given by (1.8). 

Assumption A2. Either: 

(a) Esq < oo; or 

(b) for some u; > the moment generating function £'(e*''^°l") exists for 
some t > 0; or 

(c) is almost surely bounded. 

Assumption A3. Eq has density g{s) that is differentiable and 

< J" < oo, 

where J is defined in (1.15). 

Assumption A4. The sentence including (1.6) and (1.7) is true, vq is 
an interior point of V , and in a neighborhood M of z/q, a{s; 6'( "*) = f3{s; u)~^ 
is thrice continuously differentiable in u for |s| = 1 and 

oo , 

E^'N l/^i(^o)| +sup|a,(ei-))| + sup|af (^i"))! 

+ sup|af'')(e(-))|+sup|af'''"^)(0i-))||<oo, 
N N J 

for all A;,£,m = 1, . . . ,pi — 1, where ol^{q\ ^) is defined by (1.10), (1.12) and 

af (e(-)) = (9/9z.,)a,(0i-)), af '^Vi"^) = (5/5^.)af (^i'^ af '^'"^)(e(-)) = 
{d / di'm)of^'^\Oi ^), Vk being the /cth element of v. 

Assumption A5. For all {pi — 1) x 1 nonnull vectors A, \^ {{d/dv)a{e^^] 
0Q]^ ^)}/3(e*'*'; i^o) 7^ on a subset of (— 7r,7r] of positive measure. 

Assumption A6. 

< O-Q < CXD. 

Assumption A7. 

n^'\h-em) = 0.p{l), Dn{h-9Q2) = 0.,{1), n^'\d^ - al) = Op{l), 
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where 

Assumption A8. (t)i{s) satisfies (2.3), where (t){s) is strictly increasing 
and thrice continuously differentiable and is such that, for some k > 0, K < 
oo, 

(3.1) |0(s)|<l(|s|<l) + |sri(|s|>l), 

(3.2) W{s)\ + W'{s)\ + \r{s)\ < C(l + |.^(.)|^). 

Assumption A9. 

(3.3) L — > oo as n — > oo 
and either: 

(a) 

/ log Tl \ 

(3.4) liminf — - — > 8{log r/ + max(log(/?, 0)} ~ 7.05 + 8max(log(/?, 0); 

n^oo \ L J 

or 

(b) 

(3.5) liminf f-—^—-^ > max f — , — ^ — — — ^ 

n^oo \ L log L / \ OJ 00 

or 

(c) 

(3.6) liminff-i?^) >4k, 

n^oo \ L log L J 



where 



and 



1 + 2^/2 ~ 2.414 



l + l<^(si) 



(S2) - (p{si)' 

[■Si,,S2] being an interval on which g{s) is bounded away from zero. 

Remark 1. Parts (a), (b) and (c) of Assumption A2 increase in strength 
and entail trade-offs with Assumptions AS and A9. When k = in Assump- 
tion A8, so (f>{s) is bounded, (a) of Assumption A2 and (a) of Assumption A9 
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suffice; a finite fourth moment seems hard to avoid in deahng with the devi- 
ations et{6o) — (Jo£t- Part (b) of Assumption A2 holds with uj = 1 for Laplace 
Et and with lo = 2 for Gaussian et- We require (b) of Assumption A2 when 
K > in Assumption A8, so (^{s) can be unbounded, and also (b) of Assump- 
tion A9. If (c) of Assumption A2 holds, then a fortiori we can have k > in 
Assumption A8, and can relax (b) of Assumption A9 to (c). 

Remark 2. Assumption A3 is virtually necessary. 



Remark 3 . Assumption A4 is stronger than necessary, but is chosen for 
brevity of presentation and because it is readily checked for short memory 
and invertible AR (a) and MA (/?) filters arising in models of most practical 
interest, such as ARM A and Bloomfield [4] models, and in any case condi- 
tions on the short-memory component are of only secondary interest here. A 
property useful in several places (see in particular Lemma 13 of Section 7) 
that is ensured by Assumption A4 is as follows. A (possibly vector) sequence 
0(j, j > 0, has property Pr{d), r > 0, if 

||a,||<C{log(j + 2)r(i + l)'-\ 



|a,-a,+i||<C{log(i + 2)r(i + l)'^- 



where || • || denotes Euclidean norm. For |s| < 1 and 0} 
square-summable TTj[9^^^) such that 

oo 



(+) 



i>o, 



T\T 



define 



i+h 



7r(s; 
Then, with 



= (1 

(Co, 1^1 



"'^/?(s;' 



) ' 



erty Po(-Co) >nd (a/a/9<+"')a^{eJt' 



q{^^) has prop- 



has property Po(Co)) ctj 

has property Pi{—Co)- This follows 



from Lemmas 11 and 12 of Section 7 on noting that, for a{s) = J2j^o'^j^' 
/3(s) = j^o /^j 5 the coefficient of in a{s)P{s) is Yl''k=o^kl3j-k-, that the 
coefficients of in (1 — s)~'^ and — log(l — s) are Aj((i) and that 
7r(l; e^^^) = for Co < 0, and that a(l; 6^^^) = 0, (a/9/0W^)Q(l; 6^^^) = 
for Co > 0. 



Remark 4. Assumption A5 is an identifiability condition, violated if, 
for example, ut is specified as an ARMA with both AR and MA orders 
overstated. Assumption A5, with Assumption A4, implies that 

(3.7) 



1 

2^ 



log 11 



i\\2 



d 



2— log|/3(e*^i/o)| 

Oh' 



log 1 1 



iX\2 



d 



2— log|/3(e^^i/o)| 



dX 



10 



p. M. ROBINSON 



is positive definite, with 7 given by (2.1). 0,i is proportional to the inverse 
of the hmiting covariance matrix of 9i. We define also the corresponding 
matrix with respect to 02, 

J^2 = ^/3(i;^o)' 

{2{x^ - Co) + lV/H2{xj - 6) + lV/\x^ - io){Xj - Co) \ 
iXi + Xj - 26 + l){Xi - 6 + l)(Xi -6 + 1) J' 

when Xi — 6 > where the (i,j)th element of the matrix is displayed; 
because ((xi + Xj — 26 + 1)^^) is a Cauchy matrix (see [17], page 30), and 
the inequalities in (1.8) hold, O2 is positive definite. The same is true when 
Tj — 6 = — ^ foi' some j, Q2 being defined by replacing the (1, l)th element of 
the matrix in (3.8) by 1, and the other elements in the first row and column 
by zero. 

Remark 5. The middle part of Assumption A7 is likely to be satisfied 
by the least-squares estimate of 602, under similar conditions to ours. A 
substantial literature justifies 61 satisfying Assumption A7; typically 0Q222t 
is assumed constant a priori, but the results should go through more gen- 
erally with xt replaced by least-squares residuals. Various estimates of ^oi 
(which we collectively call Whittle estimates) have been shown to be n^/^- 
consistent and asymptotically A^(0,ri^^) when < 6 < ^ under Gaussianity 
of Xt (when they achieve the efficiency bound of Section 1 and are as good 
as maximum likelihood estimates), and under more general conditions (see, 
e.g., [6, 9, 11, 16]). The estimate minimizing (1.14) with p(s) = [usually 
with Et{9) replaced by et{6)] falls within this class. This estimate (used 
by Li and McLeod [21] for fractional models and Box and Jenkins [5] for 
ARMA ones) is sometimes called a conditional sum of squares (CSS) esti- 
mate (though it is based on formulas for the truncated AR representation 
rather than for the conditional expectation given the finite past record). 
Beran [1] argued that it has the same desirable asymptotic properties for 
6 > |, tying in with Robinson's [26] derivation of standard asymptotics for 
score tests, based on the same objective function, for unit root and more 
general nonstationary hypotheses against fractional alternatives. These au- 
thors employed a different definition of fractional nonstationarity from ours, 
but for our definition Velasco and Robinson [29] established the same prop- 
erties for a Whittle estimate when — ^ < 6 < f j and for a tapered version 
of this for — ^ < 6 < 00, though the tapering infiates asymptotic variance. 
They established consistency of their implicitly defined optimizer despite 
lack of uniform convergence over an admissible parameter set that includes 
a wide range of nonstationary values of 6 Taking a Newton step from a 
previously established ?i^/^-consistent estimate avoids repeating this kind of 
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work. Velasco and Robinson's [29] estimate of o-g should satisfy the final 
part of Assumption A7 [with (a) sufficient within Assumption A2]. 

Remark 6. When k = in Assumption A8, then \<p{s)\ < 1 for all s, 
under (3.1); there would be no gain in generality by specifying (f> to satisfy a 
larger finite bound. For k > we might take (/>(s) = s^; compare (2.4). The 
reason for imposing different bounds on (p(s) over |s| < 1 and |s| > 1 is to 
allow possibly different rates of approach to zero and infinity. Assumption A8 
is stronger than the corresponding assumption of Newey [25], and is driven 
by the presence of et{9o) for small t, when it does not approximate ao£t; we 
prefer this to trimming out small t, which introduces further ambiguity. It 
is hard to think of reasons for choosing (j) that do not satisfy (3.1), (3.2), 
which imply power-law bounds on (/''(s), (j)"{s) and (t)"'{s) as s ^ oo. 

Remark 7. The weakest of the conditions in Assumption A9, (a), can 
only apply when k = in Assumption A8, in which case log if > 0. Subject 
to this, the hope is that si and S2 exist such that is arbitrarily close 
to 1, as when g[s) > for all s; then the strict inequality in (3.4) applies 
with log(f = 0. The mysterious constant rj is due to approximating W^^^ in 
the proof in terms of the Cauchy matrix with (z,j)th element j\u^~^^~^ du 
(see Lemma 7 of Section 7). Since (j) is defined for negative and positive 
arguments, this seems more natural than Newey's [25] use of the Hilbert 
matrix (/g u^~^^~'^du) and affords some slight improvement over it due to 
the many zero elements in this Cauchy matrix; following a similar proof to 
that of Lemma 7 for the Hilbert matrix, r] would be replaced by r/^ ~ 5.828. 
In fact, a constant such as rj does not arise in Newey's work because he 
is content with a slightly stronger condition than any in Assumption A9, 
LlogL/logn 0, irrespective of whether or not (j) is bounded, and without 
considering the impact of bounded et- This is because he accepts a bound of 
form L^^ at several points of his proof. Our slightly sharper bounds suggest 
that when (p is bounded it is effectively the denominator of (i-e., the 
inverse of W^^^) that dominates, while when (p is unbounded the numerator 
dominates. In the former case, the slow L corresponds to the notorious ill- 
conditioning of Cauchy-Hilbert matrices. One disadvantage of a bounded 
<j) is that a larger L might be needed to approximate an unbounded ip, 
though our slightly milder condition on L in Assumption A9(a) might help 
to justify this. Another is that it excludes (2.4), which "nests" the Gaussian 
case, though it would be possible to modify our theory to allow inclusion 
of (f)i{s) = s, say, followed by polynomial (f)£ (2.3) using bounded cp such 
as (2.5). Though the partly known nature of the bounds in Assumption A9 is 
interesting, and their reflection of other assumptions is intuitively reasonable 
in a relative sense, not only is the improvement over Newey's rate slight, but 



12 P. M. ROBINSON 

even after guessing to and (p, no practical choices of L in finite samples can 
be concluded; indeed the same asymptotic bounds result if any fixed integer 
is added to or subtracted from L. As in much other semiparametric work, 
no information toward an optimal choice of L emerges; indeed, as in [25] 
there is no lower bound on L, and besides that it must increase with n. 

Theorem 1. Let Assumptions A1-A9 hold, such that when k = As- 
sumption A2(a) holds with Assumption A9(a), or when n > either As- 
sumption A2(b) holds with Assumption A9(b) or Assumption A2(c) holds 
with Assumption A9(c). Then as n oo, n^^'^{6i — 9qi) and Dn{02 — 
converge in distribution to independent N{0, J^^^i^) , N{0,J'~^fl2^) vec- 
tors, respectively, where the limiting covariance matrices are consistently 
estimated by {Jl{0 ,e)Ri{e) /n]~^ , {JL{e,e)D~^R2{9)D-^}~^ , respectively. 

To place Theorem 1 in perspective and to further balance the focus on 
Whittle estimation in the long-memory literature, we also consider the fully 
parametric case, where g{s; 63) is a prescribed parametric form, as described 
after (1.14), on the basis of which define 63 = argmineg (5iogg(^; 63), and, 
with V'(s;^3) = -{d/ds)g{s;e3)/g{s;es), 

n 

t=l 

n 

t=l 

and redefine 9i, i = 1,2, of (2.2) as 

e^ = 9i + {Ri{e)Me, a, m-^r,{e, e^), i = i,2. 

We introduce the following additional assumptions. 

Assumption A 10. 63 is compact and ^os is an interior point of G3. 

Assumption All. For all 63 eQ- {6*03}, gisiO^) / g{s;6Q3) on a set of 
positive measure. 

Assumption A12. In a neighborhood M of 603, logg{s;93) is thrice 
continuously differentiable in 63 for all s and 

r \snp\g''''\s; e3)\+snp\g^''''\s;03)\+snp\g^^'''^\s- 63)1] ds< 00, 

where g^^\ gi^^^) ^ gik,e,m) j-gpi-gggnt partial derivatives of g with respect to 
the kth, the feth and ith, and the kth, ith and mth elements of ^3, respec- 
tively. 
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Assumption A13. = E{{d/de^) \ogg{eue^^){d/del) \ogg{eoM} is 
positive definite. 

Theorem 2. Let Assumptions Al, A2(a), A3-A7 and A10-A13 hold. 
Then as n ^ oo, n^^'^{9i — 9qi), dI/^{92 — ^02) o-nd n^^'^{63 — 603) converge 
in distribution to independent N{0,J'~^Qi^), N{0, ^'^^12^) ^'^^ -^(Oj^s 
vectors, respectively, where the limiting covariance matrices are consistently 
estimated by {Jni9,a,e3)Riie)/n}-\ {Jn{e,d,e3)D~^R2{e)D-^}-^ and 

\n-^jZ\(9/dh) \ogg{Et{e)/d; k)] [{d/del) loggiEti9)/a; §3)] 
[ t=i 

respectively. 

The proof (whicli entails an initial consistency proof for the implicitly de- 
fined extremum estimate ^3) is omitted because it combines relatively stan- 
dard arguments with elements of the proof of Theorem 1, notably concerning 
the et{6o) — aQ£t issue. Our treatment of this would also lead to a theorem 
for M-estimates of Oq minimizing (1.14) in which p{s) is a completely speci- 
fied function, not necessarily log g{s), but we omit this to conserve on space, 
and because the efficiency improvement of the paper's title would in general 
not be achieved. 

Theorems 1 and 2 suggest locally more powerful (Wald-type) tests on ^oi 
than those implied by CLTs for Whittle estimates. For example, the hy- 
pothesis of short memory, ^0 = 0, can be efficiently tested, as can, say, the 
significance of AR coefficients in a FARIMA(pii, 0), for any unknown 
^0 > — ^- We can also efficiently investigate the question of relative suc- 
cess of deterministic and stochastic components in describing trending time 
series. For example, we can apply the theorems to test 602 = 0, or, with 
P2 = li P2 = , test £,0 = T + ^ against the one-sided alternative ^0 > ^ + ^ 
[see the discussion after (1.9)]; in the first case rejection implies a signifi- 
cant deterministic trend, and in the latter, a dominant stochastic one. Tests 
based on 62 are in general more powerful than those based on least squares 
(see [31]) or generalized least squares (see [7]). 

4. Finite sample performance. A small Monte Carlo study was carried 
out to investigate the success of our semiparametric estimates in small and 
moderate samples. Along with the value of n, major influential features seem 
likely to be the form of g{s), the value of ^0 and the choice of (j) and L. 

We focused on the simple FARIMA(0, 0) model for yt (knowing //"^zq = 
0) for: 

(i) ^0 = — 0.25 (antipersistent). 
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(ii) ^0 = 0.25 (stationary with long memory), 

(iii) ^0 = 0.75 (nonstationary but mean-reverting), 

(iv) ^0 = 1-25 (nonstationary, non-mean-reverting). 

For £t we considered the following distributions [the scalings referred to 
producing var(et) = 1]: 

(a) iV(0,l), 

(b) 0.5iV(-3,l) -h0.5iV(3,l), 

(c) (scaled) 0.05iV(0, 25) 0.95iV(0, 1), 

(d) (scaled) Laplace, 

(e) (scaled) t^. 

These were mostly chosen for the sake of consistency with other Monte 
Carlo studies of adaptive estimates. The benchmark case (a), and the two 
(symmetric and asymmetric) mixed normal distributions (b) and (c), were 
used by Kreiss [19] in a stationary AR model, with kernel estimates of ijj, 
and by Newey [25] (in a cross-sectional regression model). Ling [22] used (b) 
in a FARIMA(0, 0) model with kernel estimates of ij:. Kreiss [19] also 
used (d). The point of (e) is that it only just satisfies the minimal fourth 
moment condition on ej. Assumption A2(a). Kernel approaches, from [3] 
and [28] for location and regression models for independent observations, 
through Kreiss [19], Drost, Klaassen and Werker [8] and Koul and Schick [18] 
for short-memory time series models, and Hallin, Taniguchi, Serroukh and 
Choy [15], Hallin and Serroukh [14] and Ling [22] for long-memory ones, 
have been popular in the adaptive estimation literature. Besides requiring 
choice of a kernel and bandwidth (analogous to our (p and L), they typically 
involve one or more forms of trimming, in part due to the presence of a kernel 
density estimate in the denominator of the estimate of iIj{s), and sometimes 
sample splitting and discretization of the initial estimate. Theorem 1 of 
course implies semiparametric efficient estimates using series estimation for 
short-memory models. For <j) we used both (2.4) and (2.5), and tried L = 
1,2,3, 4, with n = 64 and 128. For ^ = ^ and Velasco and Robinson's [29] 
estimates were employed, with a cosine bell taper; this is sufficient to satisfy 
Assumption A7 for all considered, albeit unnecessary when = ±0.25. 

We report the Monte Carlo relative efficiency measure MSE(^)/MSE(^) 
(where ^ = 9) on the basis of 1000 replications. Tables 1-5 present results 
for distributions (a)-(e), respectively, in case n = 64 only; generally asymp- 
totic behavior was better approximated when n = 128. For et ~ -/V(0, 1), ^ is 
efficient when (j)[s) = s for all L > 1, the efficiency improvement achieved in 
Table 1 for L = 1 being due to the tapering in ^; as anticipated, the unnec- 
essarily complicated ^ based on larger L makes matters somewhat worse. 
One expects relative efficiency to be roughly constant across ^o- The devi- 
ating results for = —0.25 and = 1-25 sometimes found in the tables are 
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largely due to the following computational policy. The grid search to locate 
1^ was confined to the interval [—0.4,1.75], and for the extreme some ^ 
fell on the boundary (especially the lower one), while we correspondingly 
trimmed ^ < —0.4 and ^ > 1.75 to ^ = —0.4 and ^ = 1.75, respectively. This 
led to some underestimation of bias and variance, and consequent distortion 
of relative efficiency. However, there is considerable stability across in the 
symmetric mixed normal case (Table 2), and also small improvement with in- 
creasing L, but slight deterioration when L = 4 for the unbounded (/)(s) = s. 



Table 1 
et ~iV(0,l) 









Hs) = s 






4>{s) 


= SI 


(1 + ^^)- 


1/2 


L 


1 


2 


3 


4 


1 


2 


3 


4 




-0.25 


0.62 


0.62 


0.62 


0.62 


0.66 


0.67 


0.63 


0.65 


Co 


0.25 


0.47 


0.48 


0.51 


0.61 


0.49 


0.52 


0.53 


0.60 




0.75 


0.46 


0.49 


0.53 


0.62 


0.50 


0.54 


0.55 


0.60 




1.25 


0.47 


0.50 


0.52 


0.61 


0.52 


0.53 


0.52 


0.56 



For all tables, Monte Carlo MSE(5~) /MSE(|) with ?i = 64 and 1000 replications. 



Table 2 
et ~0.5Ai^(-3,l) +0.5iV(3,l) 









</>(s) = s 






4>{s) 


= SI 


(1 + 5^)- 


1/2 




L 


1 


2 


3 


4 


1 


2 


3 


4 




-0.25 


0.92 


0.92 


0.83 


0.90 


0.94 


0.93 


0.82 


0.83 


Co 


0.25 


0.90 


0.91 


0.89 


0.93 


0.91 


0.91 


0.88 


0.89 




0.75 


0.90 


0.91 


0.89 


0.94 


0.90 


0.92 


0.89 


0.89 




1.25 


0.88 


0.89 


0.88 


0.92 


0.89 


0.89 


0.87 


0.87 










Table 3 














e 


t ~ (scaled ) 0.5Ar(0,25) -\ 


-0.95iV(0,l) 












(t){s) = s 








= SI 


(1 + s^)- 


1/2 




L 


1 


2 


3 


4 


1 


2 


3 


4 




-0.25 


0.71 


0.71 


0.62 


0.77 


0.81 


0.76 


0.63 


0.70 


Co 


0.25 


0.84 


0.76 


0.65 


0.74 


0.77 


0.67 


0.60 


0.54 




0.75 


0.85 


0.79 


0.70 


0.79 


0.80 


0.78 


0.69 


0.63 




1.25 


1.01 


0.96 


0.81 


0.82 


0.91 


0.83 


0.74 


0.68 
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Table 4 
Et ^ ( scaled ) Laplace 









!)(s) = s 








= SI 


(1 + s^)- 


1/2 
J./ ^ 






1 


2 


3 


4 


1 


2 


3 


4 




-0.25 


1.07 


0.85 


0.92 


0.96 


1.04 


0.90 


0.60 


0.61 


Co 


0.25 


0.89 


0.60 


0.58 


0.87 


0.78 


0.62 


0.65 


0.67 




0.75 


0.56 


0.52 


0.55 


0.81 


0.51 


0.53 


0.53 


0.54 




1.25 


0.28 


0.23 


0.23 


0.86 


0.32 


0.26 


0.28 


0.38 










Table 5 


















St ~ (scaled ) ts 
















Hs) = s 








= SI 


(1 + ^^)- 


1/2 




L 


1 


2 


3 


4 


1 


2 


3 


4 




-0.25 


0.58 


0.54 


0.53 


0.65 


0.55 


0.53 


0.55 


0.60 


Co 


0.25 


0.56 


0.56 


0.57 


0.74 


0.51 


0.54 


0.55 


0.58 




0.75 


0.58 


0.58 


0.62 


0.75 


0.51 


0.56 


0.57 


0.61 




1.25 


0.63 


0.61 


0.60 


0.69 


0.54 


0.55 


0.52 


0.53 



We find this also in the asymmetric mixed normal case (Table 3), though 
for the bounded (f){s) = s{l + s^)~^/^, mainly the improvement continues to 
L = 4, and its magnitude, at each increase of L, is notable. For the Laplace 
distribution (Table 4) there is notable sensitivity to though increasing L 
tends to improve efficiency, at least up to L = 3. For the distribution (Ta- 
ble 5) only small improvements, if any, were recorded after L = 1, as is not 
surprising for this small sample size, as asymptotic relative efficiency is 0.8; 
the deterioration with (J){s) = s at L = 4 is also not surprising due to the 
heavy tails. The results taken as a whole seem fairly encouraging, especially 
as the truncation (1.13) in computing residuals, which looms large in the 
theoretical component of this paper, would be expected to have some finite 
sample effect on in our fractional setting. 

5. Final comments. In various stationary, short-memory time series mod- 
els, Kreiss [19], Drost, Klaassen and Werker [8], Koul and Schick [18] and 
others developed local asymptotic normality (LAN) and local asymptotic 
minimaxity (LAM) theory of Le Cam [20] and Hajek [12] to establish ^/n- 
consistent, asymptotically normal and asymptotically efficient estimates, 
and, further, adaptive estimates that achieve the same properties in the 
presence of nonparametric g. A similar approach was followed by Hallin 
et al. [15], Hallin and Serroukh [14] and Ling [22] in the case of stationary 
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and nonstationary fractional models. LAN theory commences from a log- 
likelihood ratio, but in view of the difficulty in constructing likehhoods in 
a general non-Gaussian setting, the latter authors commenced not from the 
likehhood for yi, . . . , y„ but from a "likehhood" for yi, . . . , y„ and the infinite 
set of unobservable variables £t, t <0, in terms of the density g of et, or a 
"conditional likelihood" for yi, . . . ,y„ given the £t, t <0, or the yt, t < 0. 
We do not employ such constructions and do not establish local optimality 
properties. However, the M-estimate efficiency bound we achieve is of course 
the same as the asymptotic variance resulting from a LAM/LAN approach. 

Another motivation for our more elementary efficiency criterion is to al- 
low space to focus on the main technical difficulty distinguishing asymp- 
totic distribution theory for fractional models from that for short-memory 
ones. This is due to the need to approximate the truncated AR trans- 
forms et = et{9o) [see (1.13)] by scaled innovations aoSt- Consider a sim- 
plified version of the problem in which yt = xt a priori, so 9 = 6i, and define 
5t = et — (JoEt- In the following section (relying heavily on Lemmas 13 and 14 
of Section 7) we find that E\5tY < Ct'''/^, r > 2, given a sufficient moment 
condition on et- This property is useful in our proof that ej can be replaced 
by a^et in a!^^\E{9Q) / (Tq) (see Lemma 19). In some cases it is possible to 
show that the upper bound provides a sharp rate. Consider the stationary 
FARIMA(0, Co, 0) (cf. [14]), where < = Co < ^ and = t;* , t G Z. Noting 
that cov(xo,xj) > j2^°-VC, > r^""VC for j > 0, 

oo oo 

^i^t) =5m"i(^o)afc(Co)cov(2;j,j;fc) 

j=t k=t 

oo oo 

>^"' EE r««-^fe-^°-^u-fep«°-i 

j=t k=t 
l<\j-k\<t 

oo t+j 

j=t k=t+l 
2t 

j=t 

>{ct)-\ 

(This contrasts with the exponential rate occurring with ARMA models.) 
In this stationary FARIMA(0,Co, 0), 

t-i 

= E "i('?0)2;t-j - (TQEt 
j=0 
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t-1 

(5.1) =J2'^ji^o)vt-j - (Toet 

j=0 

oo 

= -Y,at+j{Co)vt-j- 
j=t 

In our "asymptotically stationary" version of the FARIMA(0,^0)0), also 
with < ^0 < ^5 we have xt = xf , but again (5.1) results, from (1.4), (1.10), 
(1.11) and Lemma 5 of Section 7. In this connection, note that for general 
Ling [22] took xt = A-^^uf + vtl{t < 0) in place of our (1.2), but this 
different prescription of xt for t <0 makes no difference to et, which depends 
on Xs for s > 1 only. 

The above upper bound for £^|(5t|'', combined with the Schwarz inequality, 
is insufficient to deal completely with the replacement of et by aoSt, even 
when -0 is smooth. Staying with the case yt = xt a priori, the proofs of 
Theorems 1 and 2 entail establishing asymptotic normality of a quantity 
of the form ci„ = n"^/^ J27=i '4'{^t)ht, where ht is {es,s <t — l}-measurable 
and has finite variance; ci„ is called a "central sequence" by Hallin et al. [15] 
[see their (2.15) and (3.11)] and Hallin and Serroukh [14] [see their (2.4)]. 
Asymptotic normality of C2n = J27=i^i^t)ht follows straightforwardly 
from a martingale CLT. This leaves the relatively difficult task of showing 
that cin — C2n = Op(l). In fact, our proof does not directly consider ci„ — C2n 
because we do not assume is smooth; we instead approximate the et 
by the aoEt within the smooth estimate of ip and then appeal to mean 
square approximation of ipi^t) by its least-squares projection on the (p{etY, 
= 1 , . . . , , as L — ^ oo , as in [25]. However, for this, Sn = n-^/^J2?=i^tht 
[i.e., cin — C2n with ijj{x) replaced by x] is relevant, and the sharper the 
bound we obtain for it the weaker some other conditions can be; we obtain 
5„ = Op((logn)3/2n-V2). 

The same kind of issue arises in theory for Whittle estimation. For short- 
memory stationary processes, with = 0, Hannan [16] established the CLT 
for various Whittle estimates. His proof does not work under stationary long 
memory, < < ^) due to the bad behavior of the periodogram and spec- 
tral density at low frequencies. However, in this case Fox and Taqqu [9], 
Dahlhaus [6] and Giraitis and Surgailis [11] delicately exploited a kind of 
balance between these quantities in order to establish CLTs. The CSS esti- 
mate minimizing J2t=i ^ti^) i^^^ Remark 5 in Section 3 concerning (1.14)] is 
not one of those considered by these authors, but its CLT requires showing 
Sn = Op{l) , which entails similar challenge to results they established for the 
somewhat different quadratic forms arising from their parameter estimates. 
Our results for replacing et by aoSt can be employed to provide a proof of 
asymptotic normality of the CSS version of the Whittle estimate. Whittle 



EFFICIENT ESTIMATION IN TIME SERIES 



19 



and adaptive estimation are both areas in which asymptotic results are qual- 
itatively the same across short and long memory, but sufficient methods of 
proof significantly differ. 

6. Proof of Theorem 1. The consistency of the covariance matrix es- 
timates is implied by the proof of the CLT. By far the most significant 
features of this are accomplished in the lemmas in the following section. 
Their application is mostly relatively straightforward, and is thus described 
here in abbreviated form. For notational convenience we now write 63 = a 
and augment 6 as = {9j, 9^,03)'^. We also abbreviate J2t=i to Et, and 
Et{do), E{0o), Eti{9o) to Et,E,Eti, respectively, 2 = 1,2. By the mean value 
theorem, for i = 1, 2, 



70i 



Jl{0) 



Slu 



+ 



MO) 



where, with [SLii{6), 3^2(0), SusiO)] = {d / d9'^)rLi{0), each row_of Suj is 
formed from the corresponding row of SLij{9) by replacing hy 9 such that 
11^ - ^oll < 11^ - ^oll where P|| = {tr(A'^^)}V2. ^^.j^g ^ ^ ^1/2^ 
D2n = Dn and define N = {0:\\Din{9i - 9Qi)\\ < 1, i = 1,2,3}. The result 
follows if 

snv\\Dr^{Rm-RM}Dr^^^ 



(6.1 
(6.2 
(6.3 
(6.4 

(6.5 

(6.6 
(6.7 

(6.8) 
where 



0, 



sup \\Dr^{Su,{e) - SL^,{eo)}DJ^\\ ^ 0, 



1,2, 
l,2,j = 1,2,3, 



sup|Jl(0)- Jl(0o 
DZ.'R.{Oo)Dr 



{R^{eo)JLiOo)}-^SL 



^1^0, 
p 



^ = 1,2, 

i = l,2,j = l,2,3. 



iV 



J^i 
JVL2 



An'K.(eo)-r.}^0, 



1,2, 



'42 5 
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with 4i = {d/de']^^^)a{B- ^!+V^^o = l{B- vo)et. 

The most difficult and distinctive problems occur in (6.8) for i = 1, which 
faces the ej — cfQEt problem, as well as the increasing L, in the presence 
of normalization only by D^^. The first of these aspects is also in (6.1) 
and (6.4), and both are in (6.2), (6.3), (6.5) and (6.6), but the normal- 
izations make (6.4)-(6.6) much easier to deal with and the proof details 
are otherwise relatively standard, albeit lengthy. The same may also be 
said for (6.1)-(6.3), except for the approximation of the fractional differ- 
ence A^" by for |^ — ^ n"^/^, bearing in mind that "nonstationary" 
values of ^, ^.re permitted. The basic steps in proving (6.1)-(6.3) are il- 
lustrated by the least complicated case (6.1). By elementary inequalities it 
suffices to show that sup_;\^Et II A>i^(4i(^) - eli(6'o))f ^0, z = l,2. Write 
a = a{B;6^~^), a' = a'{B;9^~^) with aojC^o denoting these quantities at 
u = uq. For i = 2, it suffices to apply Lemmas 1, 2, 3 and (with m = ^q) 
4, the jth elements of ao(A^ — A^'')z2t and (a — ao)A^'-''Z2t being, respec- 
tively, 0{n~^/'^ (logt)t^^~^°) and 0{n~^/'^t^i~^°) uniformly in AA, noting that 
^0 > — ^ and Xj > Co — ^ implies Xj > and Co < Xj + 1- For i = I, the 
terms in Z2t are dealt with similarly, while Lemmas 1-4 give, for example, 
a'oiA^ - A5o)(si + /i*t5o) = 0(n-V2(iogt)2) and {a' - a'Q)A^o{st + //**««) = 
0(re~^/^) uniformly in Af. In the above we apply first Lemma 3, then 
Lemma 1 and then Lemma 2, noting that in case (ii) of Lemma 1 must 
be used (either for a leading term or remainder) the coefficient of in 
the expansion of — log(l — s), and thus of (— log(l — s)Y , is positive for 
all J > 1 , so for nonnegative sequences gt,ht, such that gt < ht, we have 
\{— log Ay gt\ < |(— log A)''2;t|. So far as contributions from xt are concerned, 
from Lemma 5 

t-i . . 

sup ||(a' - a'o)A«Oxi|| < ^ sup ||a;. - a^,|| \{\A'^"v*^\ + |(log A)A?«^;f .|}, 
JV i Af J 

where a'j, a'^j are the jth Fourier coefficients of a' , Og. By the mean value 
theorem and Lemma 6 this has second moment 0(?i~^). The same result 
holds for ao(A^ — A^°)xt after taking m = niQ in Lemma 4, noting that its 
supremum over M is bounded by 




and applying Lemmas 5 and 6. The proof of (6.1) is readily completed. 

Before coming to (6.8), we briefly discuss (6.7). Consider variates U = 
(n-i/2rf , (Z^-Va)^)^, V = \^ {EUU^y^'^U for a {pi + P2) x 1 vector A 
such that A^A = 1. We have EV = 0, EV"^ = 1, since Eip{eQ) = and eJi is 
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(6.9) E 

t 

5:V'(q'){n-i||4f l(||V.(e,)4|| > 5nV2) 
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T 


■Jli " 










1^2. 



(6.10) 



+ p-ii?;2fl(||VXeOAri^;2ll><^)}^0 



for any 6 > 0. The proof of (6.9) follows from Lemmas 1 and 3 and approxi- 
mating sums by integrals, while that of (6.10) follows from stationarity and 
finite variance of i^ist) and e'^i and the slowly changing character of Z2t- 

We prove (6.8) only for i= 1, the case i = 2 involving some of the same 
steps but being much easier. Define H(^)(s) = 4>^^\s) - E^^-^^St), W^^^ = 
E{E'-^\et)E^^\et)'^}. It follows from Lemma 8 that l^(^) IS nonsmgu- 
lar, and thence we define a(^) = W^^^-'^w^^^ where u'(^) = E{(j)'^^\et)} = 
E{(f>^^^ {et)tp{et)}, by integration by parts, as in [2] and as justified under our 
conditions by Lemma 2.2 of [25]. Defining also 'il)^^\et\a^^^) = a^^^^E^^\et), 
we have 

j=ii=i 



where 



and 



n 



-1/2 



jt 



Bit 

B2t 

Bst 
Bit 



^W(e^;aW)_^(e,), 
V'(^)(ei;a(^)(e))-^(^)(ei;a(^)), 



#)(0o,cTo)-V^^)(ei;a(^)(e)), 
Cit = o-Qs'-ti, C2t = E'^i — aoe'^i- 

Since e'f.^ is {eg, s < t}-measurable and £'||eQ]^|p < C||rJi|| < oo, while B2t has 
zero mean, £'||j42i|P < CEB2Q -^0 as L ^ 00 from [10], pages 74-77, and [25], 
Lemma 2.2, since the moments of 0(eo) characterize its distribution under 
Assumptions A2 and AS. 

Before discussing other Aij define 

^ia = l + E{\et\''li\et\>l)}, 
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for a > 0, and the following sequences: 
PaL = CL if a = 0, 

= {CLf^l'^ if a > and Assumption A2(b) holds, 
= if a > and Assumption A2(c) holds, 

suppressing reference in paL to the arbitrarily large constant C; and also 
TTi = (logL)7?2^1(^ < 1) + (LlogL)ry2^1((^ = 1) + (logL)(77(^)2^l((^ > 1), 
for L > 1. 

Write ^31 = {bm - b2nhn){a'^^^ (e) - a^^) } - ^sn^Sna^^^ , where = n-^l'^a^ Y.t {^tf , 

&2n = n-iEf4, bsn = n-^/^aoj:t^^^\£tV- We have E\ct){eoW < p^r, and 
thus from Lemma 9 

L 

E\\binf + EWhnf < CY,iE\Woif + l)^0''(eo) < P2.L. 

Since 62™ = Op(n~^/^ log?i) from Lemma 17, we deduce from Lemma 10 that 
(6.10) 

Before imposing Assumption A9, we estimate A41, which can be written 



. ^( Lp2K.LT^L , jl/2 1/2 



(6.11) 
(6.12) 



a^'-XE/a^) 



+ n-"'aoY.e'a^^^\e,na^^\E/a,)-a^^\e)]. 



The square-bracketed quantity in (6.11) has norm bounded by 



(6.13) fx: 

\i=i 



2\ 1/2 



+ n 



.^=1 \ t 



2n 1/2 



where 5u = (piiEt/ao) - (piiet). We have 



(6.14) 



6u = (l)i{et)dt + l(l)'^{et)dt, 



where \et - et\ < \dt\, dt = Et/ao - et- Now et = a(B;%)(si + p*t^o ^^^^ 
and from Lemma 5 [see also (1-13)] 



a 



{B; 9oi)xt = a{B; 9Qi)vf = aoet - ^ at+j{6Q^')v-j = aoSt + du, 

j=0 



where 



j=l k=0 
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where Pj{9Qi^) is the coefRcient of in a{s;6Qi^) ^. Since a{B;6o)st = 
o(t-i/2) and a{B;eoi)t^'' = a(l; 6'J~^)r(^o + 1) + 0{t~^) from Lemma 1, it 
fohows that 



(6.15) 



dt = dit + d2 + d3 + o{t 



where ^2 = n ^ J^j^oiJ^t ^jt)^-jj ds = n ^ J2t ^t- From Lemmas 13, 14 and 18, 

for 2 < r < 4 under Assumption A2(a) and r > 4 under Assumptions A2(b) and A2(c) 



(6.16) 
(6.17) 



E\d2\' + E\d3\' < (Cr)2'-n-'^/V;('+, 



where r+ is the smallest even integer such that r < r+. Returning to (6.13), 
we have 



(6.18) 

(6.19) 

(6.20) 

(6.21) 

(6.22) 
Now 



< 



J2^M'M)-E<p',{eo)}du 
Y.eMiiet)-EcP',{so)} 



+ 



i\d2\ + \d3\) 



+ \E<l>',{eo)\ 
+ \E<P'Aeo)\ 



t 

E4i 



+ 



\4{s)\=£W{s)^'-\s)\ 
(6.23) < Ci{l + \cl){s)f){l{\s\ < 1) + \s\^^^-'h{\s\ > 1)} 

< C£{l{\s\ < 1) + \s\^^^~^+^^h{\s\ > 1)}, 
and since et is independent of s'^^du, the right-hand side of (6.18) is 

O, (^{Ect>',{eofY" \\'Ed\,fl^^ = 0,{ly}il,^^^ logn), 

using (6.16). The same bound applies to (6.19)-(6.21), proceeding similarly 
and using respectively (6.17), Lemma 16, and (6.17) with Lemma 17; note 
that it is the second factor in (6.20) which leads to the main work in handling 
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the quantity Sn discussed in Section 5. So far as (6.22) is concerned, note 
that as in (6.23), 

< Cf{l{\s\ < 1) + |sr(^-™)l(|s| > 1)}, 
so by the c^-inequality ([23], page 157) (6.22) is bounded by 

(6.24) c^'+'^'E +4i^*r^'^'^^ + irfi*r^'+'^^+'} 

+ c^'^H' \Hi \m - du)\i + ietr('+^)) 

(6.25) * 



+ \dt - dit 



}• 



By (6.16) and Holder's and Jensen's inequalities, (6.24) has expectation 
bounded by 



C 



logn + E(^|dit| 



2K.{e+K)+4a/2 



r£ being the smallest integer such that > 2k(£ + K) + 4. From (6.14) and 
(6.17), (6.25) is of smaller order in probability. It follows from Lemma 9 that 



E 



2\ 1/2 



:0,((CL)™p^/llogn). 



By a similar but easier proof, the second term in (6.13) has the same bound, 
and by Lemmas 10 and 19, 

{6.n) = Op{{CLf^''+'p2.L7TLn-'/^logn). 

Next, from similar but simpler arguments to those above. 



n 



-1/2 



Application of Lemma 9 indicates that (6.12) is 
Thus 

Ail = Op{p2^L7TL{p2nL7TLL^ + (CL)2«^+3 

+ P2.L7rL{CLr'^+\-'/')n-'/'logn). 



(6.26) 
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Comparison of (6.10) and (6.26) indicates that A^i is dominated by A41, 
whose behavior under Assumption A9 we thus now consider. Take k = 0. 
From Lemma 9, under Assumption A9(a) 

f 41ogL + loglogn + 21og7rL 1 

logn 



Op{ exp , 

log n 2 , 

which is Op(l) if hm sup log tt^/ logn < ^, as is clearly implied by (3.4). 
Now take k> under Assumption A2(b). From Lemma 9, under Assump- 
tion A9(b) 

^41 = Op((i^"''/"+' + i'"''('+'/"^+')n-i/2iogn) = Op(l), 

on proceeding as before. Under Assumption A2(c), Lemma 9 and Assump- 
tion A9(c) give 

^41 =Op((CL)2-^n-i/2iog„) = Op(l). 
To consider A12, we can proceed as earlier to write 

E^'i - 4i = Dit + D2 + D3 + (t-i/2 logt), 

where 

00 00 / \ 

i=i j=o\ t I t 

and ~Xjt = Ei=oid/de[-'^^)ak+MV)Pj-k{0iV)- Using (7.23) and (7.24) of 

Lemma 13, we deduce that |Ajf | < C{logt)j'^H-'^°-'^ , j < t, and |Ajt| < C{logt)j'^"~^ max(j^'^o,t"^o), 
j > t, and then proceeding as in Lemma 14, that ET=o^t < Ct-^log^t, 
Ei^o(Er=i ~Xjt? < Cnlog^n. Noting that E(j:^^{et) x Duf <CY.tEDl, 
using also Lemma 17 and proceeding as in the proof for (6.11), it follows 
that A12 = Op(n-V2iog3/2n). 

The remainder of the proof of (6.8) with i = 1 deals in similar if easier 
ways with quantities already introduced and is thus omitted. □ 

7. Technical lemmas. To simplify lemma statements, we take it for granted 
that, where needed. Assumptions A1-A9 hold. 

Part (ii) of the following lemma is only needed to show that st in (1.9) 
contributes negligibly, in particular when it includes ti < ^0 ~ 1 • 

Lemma 1. (i) For wt = f with 7 > — 1 and ^ G (— + 1); 

= + Oit^^^-' + t^-™"il(e > 0)), 

r(7-^ + 1) 
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as t ^ oo, where m is the integer such that ^ — 1 < m < ^. 
(ii) For wt = {logtyt^ , r>Q, i>-\, 



A^wf = 0{t 



max(7, — 1)— g+Q 



as t — > OO, 



for any 5 > 0. 



Proof, (i) The proof when ,^ is a nonnegative integer is straightforward, 
so we assume this is not the case. We have 



(7.1) 



5]/A,(-0 = 0, j = 0,...,m, 



when m > and ^ > 0, (1 — s)^ and its first m derivatives in s being zero at 
s = l. With Ofc = Afc(-7), 



i-l 



(7.2) 



j=0 

t—1 oo 

=t^^A,(-oE«^^ovo' 

I oo 

= -t^5](t-fc)-^afc^/A,(-Ol(m>0) 
k j=t 

+ t7^(i_fc)-fca,|]/A,(-e), 

k j=0 

where Efc = T,T=o^ T,k = E^max(m+i,o) and we apply (7.1). By Stirhng's 
approximation 



(7.3) 



A,(-e) 



r(-6 



J>1, 



SO (7.2) differs from 



(7.4) 



I k j=t 



t-i 



+ Y.{t-k)-'akY.j'~'-' 
k i=o 
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by 



(7.5) O t^5]t-'=|afc|^/-«-2^(7n>0)+t^^t-^-|afc|^j 

V k j=t k j=0 

Now 

|]r" = ti-"/(l-a) + 0(t-"), a<l, 



(7.6) 



j=Q 



^r" = ti-"/(a-l) + 0(t-"), a>l. 

Thus (7.5) is 

oft7-€-iV-M^i(^>o) 



oU-^-4j2\ak\t{m>Q) 



k 

II 



+ + + Ct^"'"-'l«m+i|l(m > -1)| j , 

where X^fc' = Z]fcLmax(m+2,o) • '^'^^ ^'^^t braces is finite because m and the 

are, while the second sum is finite because \ak\ < Ck^'^"^ . Thus since 7 > —1, 
(7.5) is 0(t'^-™-^) for ^ > and Oif^-^) for ^ < 0. Applying (7.6) again, 
(7.4) is 

■ 'Y — f CXD 



and the leading term is {r(7 + l)/r(7 — ^ + 1)}^'''"'', from [30], page 260. 
(ii) We have 

A^w* = J2 A,(-e){iog(t - j)V{t - jy. 

j=0 

Noting that Aj(-^) = 0(i"^"^) and (7.1) holds with A; = for ^ > 0, 
^ A, (-0{log(t - j)nt - jV ~ (logt)'-t^ ^^(-0 = 0{t^+'^s"^) 

j=0 j=0 
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for s = o{t), 5i > 0. On the other hand, 



(7.8) 



X: A,(-C){log(t-i)r(t-jT 

j=s+l 

The sum on the right-hand side is 0{t^~^'^) for 7 > —1, 0((logt)) for 7 = —1 
and 0(1) for 7 < —1. Thus choosing s = ^^""^2/(5+1)^ y produces the re- 
sult. 

□ 

Lemma 2. For wt = t"^ and any integer r > 0, as t —> 00 
(7.7) (-logA)"^?!;* ~(logt)^t^ /or7>-l, 

= 0{t-\\ogtr-\l{^ < -1) + (logt)l(7 = -1)}) 

for 7 < —1. 

Proof. Suppose (7.7) is true for a given r. Then as t — > co 

(7.9) (- log I\r+^wf ~ (- log A) (log tfwf = Y^r^{\og{t-j)Y{t- j)\ 
The difference between this and 

(7.10) {\ogtrY.r\t-jy 

is bounded by C(logt)''~-'^ times 

X:r Hiogt - iog(t - j)}{t - jy < iog(i - j/mt - jV- 

Splitting this into sums over j G [1, [t/2]] and j £ [[t/2] -|- l,t — 1], it is seen 
that the first of these is bounded by 

i=i 

since | log(l — x)| < a; for x G (0, ^), while the second is bounded by 

t-i 

Ct-'Y.\^ogij/t)\f'<Cf'logt. 
i=i 

The difference between (7.10) and 

(7.11) {logtyvY^r' 
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is bounded by 



t-1 



ciiogtyt-rY^r^i-j/tr - 1| < ciiogtyt-^. 

Then (7.11) ~ (logt)''"'~^i'^ as t ^ oo. For 7 < — 1, we can write 

{-iogAywf = J2af\t-jy, 

where af^ = 0({log(j + l)}'"-^j-i). Sphtting the sum as before, the first one 
isO((logt)'^i'^) and the second is ©((logt)'^"^*"^) for 7 < -1 and 0((logt)'^t-^] 
for 7 = — 1 . □ 

In the following four lemmas 6(e*^) is taken to be a function with abso- 
lutely convergent Fourier series, and bj = {2'k)~^ J^^b[e^^)e^^^ dX. 

Lemma 3. For wt = , 

h{B)wf ~ 6(l)t^ as 00. 

Proof. The left-hand side equals Ej=o + Ej=o bj{{t - jV - i'^}- 
The first term differs by o(P) from 6(l)t'^, and the second is bounded by 



t-1 



j=0 

from the Toeplitz lemma. □ 



j=0 



Lemma 4. For a sequence wt such that wt = 0, t <0, and any integer 
r, as ^0 

(log A)^-(A^ - A^'>)b{B)wt = {logAY+^A^°b{B)wti^ - Co) 
^^'^'^^ t 1/2 

+ 0(^|^^(A™u;,_,)2| (e-eo)^) 

for m G (^0 - |,^o + 

Proof. By the mean value theorem the left-hand side of (7.12) is 

{logAY+'A^^^b{B)wt{^ - ^0) + ^ilogAY+%iB)A^wt{^ - ^of, 

for l^-^ol < l^-Col- The last term can be written i CJA"^^;^_j(C-^o)^ 
where Cj is the coefficient of s^ in the Taylor expansion of {log(l — s)}^^"^ x 



30 P. M. ROBINSON 

(1 — Prom Stirling's approximation, Cj ~ (logj)''"''^j™~'>~^ as j oo. 

Now m — ^<m — ^o + |C~Co|- The right-hand side of this is less than ^ if 
l?~?o|<^ — "i + Co) where the right-hand side of the latter inequality is 
positive. Thus for |^ — ^o| small enough, m — 1<— |. Then Y^fLi cj <oo 
for all r, so the proof is completed by the Cauchy inequality. □ 

Lemma 5. For real ^ and rriQ defined by (1.2), 

(7.13) A^b{B)xt = Ai-'"">b{B)vf, teZ. 

Proof. The left-hand side of (7.13) is 

A^b{B)A-"'''v* = A^-"'%{B)vf, teZ. □ 

The next lemma gives a uniform bound for the variance of a process that 
is only "asymptotically stationary." 

Lemma 6. For all r > 0, and Co defined by (1.4), 

(7.14) E{{-logAYA'^%{L)vff <C <oo. 



Proof. The left-hand side of (7.14) is 
(7.15) 







2 






\l-e'^ 


J — IT 


j=0 





f{X)dX<C[J2\c,\ 

\j=0 



for Co > since |1 — e | """/(A) is integrable, Cj being the jth Pourier co- 
efficient of [{- log(l - e^^)Y{l - e*^)^o]6(e*^). The jth Pourier coefficient of 
the factor in braces is 0((logj)'^ j~^°~^), so since the bj are summable so are 
the Cj. Por Co < |1 - e*^|-2fo/(A) is bounded, so the left-hand side of (7.15) 
is bounded by Yl'o' c| < oo. □ 

Lemma 7. Let Sm be the m x m matrix with {j,k)th element {j,k > 1), 



even) 



nJ+'=-2 ^ 2(i + k- l)"^l(j + k 
Then for m sufficiently large, 

tr(5-i) < (2vr)-2 + \ log{ {2m " 3) (f^ - 1 



Proof. It is clear that, like Sm, must have (j, A;)th element that is 
zero for all odd j + k. This immediately ensures the necessary property that 
even rows (columns) of 5m are orthogonal to odd rows (columns) of . It 
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then suffices to study the two square matrices Si^m and S2,m formed from, 
respectively, the odd and even rows and columns of Sm ■ These exclude only 
and all zero elements of Sm, and 5""^ is the mxm matrix whose {2j — 1,2k — 
l)th element is the (j + A;)th element of S'j"^, whose (2j, 2A;)th element is 
the (j, A;)th element of 82^^ and whose other elements are all zero. Thus it 
suffices to consider S'j",^^ and 82^, and indeed tr(S'~^) = tr(S'f^) +tr(5^^). 
We take m to be even; details for m odd are only slightly different and since 
we want a result only for large m this outcome will clearly be unaffected. 

Si^m and 52,m are both Cauchy matrices (see, e.g., [17], page 36), having 
(j, A;)th element of the form {aj + afc)~^, in particular, {j + k — |)~^, (j + 
k — ^)~^, respectively. From Knuth [17], page 36, the jth diagonal elements 
of ^2,1 are, respectively, 2Uf{j)/{Aj - 3), 2C/|(j)/(4j - 1), where we 

define, for real s, 

nl<^<m/2{i + S-m^ 



Ui{s) 

U2{S) 



nl<^<m/2{i + S-l/2f 



Thus 

m/2 

tr{S-') = 2 ^{(4j - 3)-'Uf{j) + (4j - l)-^^|(i)} 
< I 2 + - log(2m - 3)1 max U?(j) 

[ 2 J l<j<m/2 

For se (0,m/2-l) 

TT(^ TK TT(^^^ (g + m/2-l/2)(m/2-g) ) 

The factor in braces is 2 — m{m — l)/{2s(2s — 1)}, which is negative for s < 
s{m) and positive for s > s{m), where s(?7i) = \ + {2m{m — 1) + l}^/^/4 ~ 
m/\/8 as m ^ 00. Thus, as m ^ 00 

(7.16) m» f/.O) r((l/2+l/V^)™-l/2) 



i<j<m/2 r(m/V8- l/2)r(?n/V8)r((l/2 - 1/V8)m+ 1)' 

Applying Stirling's approximation, that is, 

r{am+b) ~ (27r)i/2e-""(am)""^+''-i/2 
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as 771 — > oo, and noting that 

f (l + 2-VY+^-^^2^'^^^ V/^_. ,.1/2 

I (1-2-1/2)1-2-1/^ j 1+^ , 

(7.16) is (27r)^^r/'"(l + o(l)). In the same way it can be seen that U2{s) is 
maximized at {2m(m+ 1) + 1}^/"^ /A— ^ m/ \/8, whence maxi<j<„/2 ^2(i) ~ 
(27r)~^r7'"(l + o(l)) also. The proof is then routinely completed. □ 

Denote by X{A) the smallest eigenvalue of the matrix A. 

Lemmas. AsL^oo, 

A(W^W)"^ = 0(7rL). 

Proof. The method of proof, given Lemma 7, is similar to one in [25], 
but we obtain a refinement. Define (s) = (1, cf)^^^ (s)'^)'^ , w\^^ = E{(f>^^^ {et) x 
(p^^\et)'^}, so W^^^ =PW\_^^P^, where the L x (L+ 1) matrix P consists 
of the last L rows of the (L + l)-rowed identity matrix. Then \(W^^'^) > 
Uwi^^MPP^) = Uwi^^). If (-1,1) C {(t){si),^{s2)) (which implies if < 
1), then [since ^'{s) is bounded on (si, ss)] A(Ty|^^) > X{Sl+i)/C > tr(S^|^)-VC7, 
where we use Sm defined as in Lemma 7, which can then be applied. Oth- 
erwise, w\^^ exceeds, by a nonnegative definite matrix, 

(7.17) r^"\(^)n(^)^d.= (^M^^lA r n(^).(-)^<i.A^, 

where u^^'^ = (1, u, . . . , u^)^ and A is the lower-triangular matrix with (i, j)th 
element (*.~^)i;^(si)*~-'{i;^>(s2) — ';^('Si)P~^, j < i. The smallest eigenvalue of (7.17) 
is no less than C7-1{(^(s2)-<A(si)}A(^^^)A(5l+i). Now A(AA^) > 
where by recursive calculation A~^ is seen to be lower-triangular with (i, j)th el- 
ement a'^ = (p ■){-</'(si)}^-n0(s2) - J < i- Thus 

L+l / i \ L+1 / i \ 2 L+1 

i=i \j=i I i=i \j=i I i=\ 

This is bounded by (1 — i/3^)~^ for 99 < 1, by L + 1 for 99 = 1 and by {^p^ — 

1)-1 X V92(L+1) foj, 1_ □ 

Lemma 9. Fora>0, 6>0, 

L 

(7.18) ^l^ai+b< PaL- 
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Proof. In case a = 0, or a > but Assumption A2(c) holds, this is 
trivial. For a > under Assumption A2(b), monotonic nondecrease of Ha in 
real a implies that the left-hand side of (7.18) is bounded by 

[aL+b] /^rx(a/K)L 

i=i ^ ^ ^ 

for any t £ (0, 1), and by Assumption A2(b) there exists such t that this is 
bounded by paL- D 

Lemma 10. Asn^oo, 

\\a(^^=0{Lplil7,L), 
(7-19) , ^ 




Proof. Write 

From (6.23), the Schwarz inequality and Lemma 9 

||^(L)||2 = j2f{E{4>'{eo)(t>'-\eo)}V < CL'Y.t'Mi+K) < L'p2.l. 
e=i 1=1 

Similarly, and from independence of the £t, 

L 

L 

E\\W^^\e) - T^(^)f < n-iEE^{'^(£o)'('+')} < {L/n)p^,L. 

k,£=l 

Now apply Lemma 8. □ 

Lemma 11. For j >0 let aj = Aj{d) for d < 1 and < C{j + 1)"^. 
Then the sequence J2k=o'^j-kPk, 3 > 0, has property Po{d). 



Proof. By Stirling's approximation aj has property Po{d), whence the 
proof is completed by splitting sums around j/2 and elementary bounding 
of each. □ 
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Lemma 12. For j >0 let the sequence aj, j > 0, have property PQ{—d) 
and for d> let J2'j^o Then for \d\ <1 the sequence 

j 

fc=0 

has property Pi[—d). 

Proof. We give the proof only of — 7j+i| < C{\og{j + the 
proof of |7j| < C{log(j + being similar and simpler. We have 



7i - Ij+i = E{(i + 1 - k)-^ - (i + 2 - k)-^}ak - (i + 1 



Jj a 



fc=0 



fc=j+i 

where j = [j/2]. The second term is bounded by Cj~'^~'^ and the third 
by C(logj)j~'^~^. For d < the first term is bounded by Cj~'^~'^ and for 
(i = by C(logj)j~'^~^. For d> we apply summation by parts to this first 
term and J2'^o "^j = to obtain the bound Cj~'^~'^ again. □ 

Lemma 13. Let the sequence aj, j > 0, have property PQ{—d) and the 
sequence j3j, j > 0, have property Po{e), and let 



oo 
j=0 



(7.20) 



^\(3j\ <oo ife = 0, 

j=0 



ife<0. 



E/5. = o 

j=0 

Then for \d\ <1, |e| < 1 it follows that for all j > 0, t>0, 

j 

< cft-'^-^ 



(7.21) 



k=0 



j<t, 



(7.22) <Cf-^max{j-'^,t-'^), j>t. 

If instead aj has property Pi{—d) and (7.20) is not imposed, 
j 



(7.23) 



k=0 



1 +\ -e.-d-l 



<C{log'+H)ft 



j<t, 



(7.24) 
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Proof. We prove only (7.21) and (7.22), the proof of (7.23) and (7.24) 
being very similar but notationally slightly more complex and less elegant. 
Write Sab = Y!k=a(^t+kPj-k- We have 



\Soj\<t~^~^Y.\l^k\<Cft 

k=0 



e j.~c!— 1 



e>0. 



This proves (7.21) for e > and all d. On the other hand, with j = [j/2], 
summation by parts gives 

i-i k j 

\Sqj\ < J2 - Pj-k-i\ Wt+i\ + \^t+k\ 

k=0 1=0 k=0 



(7.25) 

while 
(7.26) 



U=o 



^^.e-l^-a^ (i>0,ane. 



\Sj^,J<Cit + jr'~'f<Cj 



all d:e>0. 



This proves (7.22) for d > 0, e > since f~'^~^ < f~H~'^, j > t. For e < 

J — 1 oo oo 

fc=0 i=k+l k=j+l 



Since 



ET=oPj = 0- This is bounded by C{t~'^~^f+^ + t~'^~^f} < Cft 



for j < t, to prove (7.21) for e < and all d. For e < and all d 



Y, '^j+t-kPk 



k=0 



Y i'^j+t-k - Oij+t-k-l) Y 



a 



t+j- 



-1 Y (3k, 



k=0 



i=k+l 



k=j-j 



and this is bounded by C{(t + j)-^-2je+i _^ ^^_^j^_d-i^-e| < ^je-i^-d^ ^j^j^^^ 
with (7.25) proves (7.22) for d > 0, e < 0. Finally, for d < and all e 



Y, (^j+t-kPk 
k=j-j 

which with (7.26) completes the proof of (7.22). □ 



e-d-l 
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Lemma 14. For |Co| < \, 

(7.27) 



(7.28) 



j=0\t=l / 



oo / n 



Proof. In this and subsequent proofs we drop the zero subscript from ^o- 
We omit the proof for = as it is simple. From Lemma 13 

oo i oo 

The first sum is bounded by Ct^C+i ^^^^^ ^.j^g second by Cd''^^ Ylf=t < 
Ct-^ when C > and by CY.f^tJ"'^ < Ct~^ when C < 0, to prove (7.27). 
For j <n and C 7^ 



t 



3 
t=l 



<C7/-i^max(r^,t-?) + C/ ^ r^^^ 

t=j+i 



For j >n 



t 



Thus 



< Cmax(l,(jyn)^). 

n 

< ^ max(i-^, t-f) < Cmax(n/j, (n/jf ). 



EE =E EA.. + E EA. 



■it 



j=0\ t 



j=0 \ t 



j=n+l \ t 



<Cn + Cn^-^^ j"^^'^ ^ C'^' C > 0, 



]=n 



<Cn-'iY.3'^ + ^'T.r'<Cn, C<0, 



j=n 



to prove (7.28). □ 
Define 
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Lemma 15. For < Co < f '^nd j > I, 

(7.29) hjk < Cj~^/Vm(j~i/2^fc-^/2)^ l<k<n, 

(7.30) <Cj"^A;^»"^n^/2-^«min(//2^n^/2), k>n. 
For < Co < and j > 1, 

(7.31) 

0<e<i + Co,l<^<n, 

(7.32) < Ck~^ mm{n/j, log n), k>n. 

Proof. It follows from Lemma 13 that for 1 <k <n, 

k n 

(7.33) hjk < Ck'^-^Y.i^ + J)"^ max(A:-'^, r^) + Cfc^ ^(t + j)"^*"^"^ 

t=l t=k 

Suppose C > 0. The first term on the right-hand side is bounded by 

cr^k^~^Y.^~i-^cr\ j>k, 
t=i 

t=i 

The second term on the right-hand side of (7.33) is bounded by 

n 

cj-'k'^Y.^-<-'^cj-\ j>k, 

t=k 

n 

cr^'^k'^Y.^-^-''"<cm~"\ j<k. 

t=k 

This proves (7.29). Let C 0. The first term on the right-hand side of (7.33) 
is bounded by 

k 

Ck~^Y.(^ + ^)~^ ^ Cmm{r\k-Hogk) 
t=i 

and the second by 

oo 

ck^j-y^-' E < cr'/^~'k-'/'+', j > k, 

t=k 

n 

Ck^Y.^-'^"^ <Ck~\ j<k. 

t=k 
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This proves (7.31). For k>n (7.30) and (7.32) are readily deduced from 

V < ck^-^Y.i-^ + jr 'i"^i(c > 0) + ck-^Y.^t + j)-H{c < 0). 



Lemma 16. For |Co| < h, 



E 



2 

2 

51 ^il X! -^J*^ 
t j=0 



< C(logn) 



3 



Proof. Writing 7(5; uq) = Y^^^^jS^ , the expression within the norm is 
—1 00 00 

(7.34) E E 

It+j^-j E ^kt^-k + E E ^jk^-j^-k, 
t j=l-t k=0 j,k=0 

where Hji^. = J2t Ij+t^kt- The squared norm of the first term has expectation 
bounded by 

EEf E ||7.+,llll7*+,ll)fEA..A,,). 

s t \j=max(l-s,l-t) / \fc=0 / 

For s <t the first bracketed factor is 0{{t — s + 1)"^ logn) because ||7j|| < 
C(j + 1)~^, while the second one is bounded by 

j=l j=s+l 



00 



+ C' E J^^~^max(j"^,s"^)max(j"^,t"^) 
j=t+i 

< C{s-k^'H{c > 0) + sk~^^H{c < 0) + (st)"^/^ ^(c = o)} 

We have 

t [V2] t 

^(t_s + l)-is-i/2<^(i_, + l)-i,-i/2+ ^ (t_, + i)-i,-i/2 

s=l s=l s=[t/2] 

< C{logt)t-'/\ 

Ci\ogn)Y,ilogt)t-'<Cilogn)\ 

t 

Next, since \Hjk\ < Chjk, the squared norm of the second term on the right- 
hand side of (7.34) has expectation bounded by 

00 

C ^ E^^ifc + hjjhkk + hjkhkj). 

j,k=0 



EFFICIENT ESTIMATION IN TIME SERIES 
We apply Lemma 15 to complete the proof. For C > 



n k n CO 

2 



j,k=0 k=lj=l k=lj=k 

OO Th OO OO 

k=nj=l k=nj=n 

<C(logn)2, 

OO n OO 

E hn <CYr'+ n^-^ J2 < C log n 

j=0 j=l j=n 



and 

OO n k n OO 

j,k=0 k=lj=l k=lj=k 

OO 

j,k=n 

<C(logn)2. 

For C < 

OO n k n CO 

E E h% < E E(^-' log kf + cYT. r'^'-k^'^'' 

j,k=0 k=lj=l k=lj=k 

OO n OO 

+ Cilognf E E fc"' + Cn' 

k=nj=l j,k=n 

<C(logn)3, 

OO n OO 

Y^n <CYr'+CnJ2r^< Clogn, 

j=0 j=l j=n 

OO n k 

E E ^-^^'^^ < ^ E E r'/'^^k-y^-^ log A; 

j,fc=o fc=ii=i 

n OO OO 

+ Clogn^^j-V2-.^-V2+e^-i + Cn2^^(jA:)' 

k=lj=n j,k=n 

< C(logn)2. 
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Lemma 17. 



Proof. We have 



E 



E4 



<C7(logn)V. 



n— 1 /n~j 



oo / j+n 



E4 = E Et. ^. + E E 7. 

t j=l \j=l / j=o\j=j+l 



Thus 



Since 



E^ti 



n-j 

E7. 

i=l 



<^ E 

\i=i 



n-j 

Et. 



2\ 2 



+ E 

\i=o 



E 7. 
i=J+l 



2\ 2 



n—j 

<E 117^11 <cE^"'^^i°g^' i^i< 

i=l i=l 



n. 



J+n 

E 7i <C ^ ri<Clogn, 
i=j+l «=i+i 

the proof is readily completed. □ 



1 < J < n, 



Lemma 18. For any sequence cj, j > 0, and any r >1, if fj,r+ < oo, 



E 



oo 

E 



r/2 



r/r+ 



\j=0 



where r+ is ^/le smallest even integer such that r+ > r. 

Proof. For r < 2 the proof follows by Jensen's inequality and direct 
calculation. For r > 2 the Marcinkiewicz-Zygmund inequality indicates that 

/ oo \r/2 



(7.35) 



E 



E ^3^-3 
j=0 



\j=0 



where Cr = {18r3/2(r- l)-i/2}r (gee [13]^ p^gg 23). By the c^-inequality (7.35) 
is bounded by 



Cr^/^-UE 



E^(-^.-i) 

i=o 



r/2 / 00 \r/2> 
2 1 



+ E4 

\j=0 



00 



1)^ 



r/4 / 00 \ r-ZZ ' 
2 \ 



+ E^? 

\J=0 



EFFICIENT ESTIMATION IN TIME SERIES 



41 



For 2 < r < 4 the first expectation in the last line is bounded by 



r/4 



r/4 



r/2 



r/4 



j=0 



\j=Q 



\j=0 



For r > 4 we instead apply the c,.-inequality to that expectation, and then 
the Marcinkiewicz-Zygmund inequality again, and so on, eventually bound- 
ing (7.35) by 



oo \ r/2 
2 



CrCr/2Cr/A ■ ■ ■ C2 ■ 2^/^ • 2'/^ • 2^/^ • • • 1 ^ 

\j=o 



i,r/r+ 



The result follows on noting that r ■ r^/^ • r^/^ ■ ■ ■ r^/*" < r^, 2^/^ • 2^/^ • • • 1 < 2, 
2i/2 . 4I/4 . . . ^i/r > g^j^jj _ 1) < 2 for all j > 2. □ 

Lemma 19. As 00 

||a(^)(£;/ao) - a(^)(e)|| = ©^(p^STrKL^n-i/^ + logn)). 

Proof. Because the proof is similar to details in Section 3 we sketch 
it. It turns out that {W'^^Xe / ao)'^ - W'^^\e)-'^}w^^\E / gq) dominates 

X {w(^) [E/aQ) - w'^^^ (e)}, so we look only at the former. || H^(^) {E/ao) 
is bounded by 



(7.36) Cn-^ 



EE|(^E'^'^*^«j +\Y.^k{et)5u^ 



(incorporating a term due to the mean-correction, which is of smaller order). 
Using (6.14), 

(7.37) <t>k{et)5u = E MeM'i{et)dt + i MetHii^t)dl 



We have 



E 



Ei'^fc(^*)'^^(^*) ~ E(f)k{eo)(p'i{eo)}dit 



<CE{Meo)'Pe{eo)}^J2^'^' 

t 

< Cffi2f,{k+e+K) logn. 



Replacing du by dt — du gives no greater bound, by virtue of (6.15) and (6.17). 
On the other hand. 



(e+K) 



n 



l/2^ 
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because J2t ^t = implies J2t dt = J2t ^t- Next 



t 



Proceeding as in Section 6, this is Op{{C£)'^'^^~^'^ Hi^kfJ-re^ogn) , where is the 
smallest even integer exceeding n{i + K) + 2. It follows that 

k,i=l \ t ) 

Also 

(e E f E ^^^^^^T < E E 4 < E E ^i^^^f<^l 

I fc/=i \ t / ) 1=1 t e=i t 

and by proceeding as before this is Op{{CL)^'^^~^'^ p2KL^ogn). The proof is 
completed by application of Lemmas 8 and 10. □ 
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